---
title: 'SDPM Chunker (Legacy)'
description: 'Semantic Double-Pass Merging chunker - now integrated into SemanticChunker'
icon: 'layer-group'
---

<Warning>
**Deprecated as of v1.2.0**

The SDPM (Semantic Double-Pass Merging) functionality has been integrated into the main `SemanticChunker`. 

**Recommended Migration:**
```python
# Old way (deprecated)
from chonkie.legacy import SDPMChunker
chunker = SDPMChunker(skip_window=1)

# New way (recommended)
from chonkie import SemanticChunker
chunker = SemanticChunker(skip_window=1)
```

The new SemanticChunker provides all SDPM capabilities plus additional improvements like Savitzky-Golay filtering for better boundary detection.
</Warning>

The `SDPMChunker` extends semantic chunking by using a double-pass merging approach. It first groups content by semantic similarity, then merges similar groups within a skip window, allowing it to connect related content that may not be consecutive in the text.

## Why Use the New SemanticChunker Instead?

The new `SemanticChunker` includes all SDPM functionality plus:
- **Better performance**: Optimized C extensions for faster processing
- **Smoother boundaries**: Savitzky-Golay filtering for noise reduction
- **Cleaner API**: Simplified parameter names and improved defaults
- **Active development**: Ongoing improvements and bug fixes

## Legacy Installation

If you need to use the legacy version for compatibility:

```bash
pip install "chonkie[semantic]"
```

Then import from the legacy module:

```python
from chonkie.legacy import SDPMChunker
```

## Legacy Usage

<Note>
This documentation is preserved for users who need to maintain existing code using SDPMChunker. For new projects, please use the main [SemanticChunker](./semantic-chunker).
</Note>

### Basic Initialization

```python
from chonkie.legacy import SDPMChunker

# Legacy initialization
chunker = SDPMChunker(
    embedding_model="minishlab/potion-base-32M",
    threshold=0.5,                              
    chunk_size=2048,                             
    min_sentences=1,                            
    skip_window=1                               
)
```

### Legacy Parameters

The legacy SDPMChunker uses these parameters (many now renamed in the new SemanticChunker):

- `embedding_model`: Model identifier or embedding instance
- `mode`: "cumulative" or "window" (removed in new version)
- `threshold`: Similarity threshold (0-1) or "auto"
- `chunk_size`: Maximum tokens per chunk
- `similarity_window`: Sentences for threshold calculation
- `min_sentences`: Minimum sentences per chunk (now `min_sentences_per_chunk`)
- `min_chunk_size`: Minimum tokens per chunk (removed in new version)
- `min_characters_per_sentence`: Minimum characters per sentence
- `threshold_step`: Step size for threshold calculation (removed in new version)
- `skip_window`: Number of chunks to skip when merging

### Example Migration

#### Old Code (Legacy)
```python
from chonkie.legacy import SDPMChunker

chunker = SDPMChunker(
    embedding_model="minishlab/potion-base-32M",
    mode="window",
    threshold="auto",
    chunk_size=512,
    min_sentences=1,
    min_chunk_size=2,
    skip_window=1
)

chunks = chunker.chunk(text)
for chunk in chunks:
    print(f"Sentences: {len(chunk.sentences)}")
```

#### New Code (Recommended)
```python
from chonkie import SemanticChunker

chunker = SemanticChunker(
    embedding_model="minishlab/potion-base-32M",
    threshold=0.7,  # Explicit threshold instead of "auto"
    chunk_size=512,
    min_sentences_per_chunk=1,  # Renamed parameter
    skip_window=1  # Same functionality
)

chunks = chunker.chunk(text)
for chunk in chunks:
    print(f"Token count: {chunk.token_count}")
```

## Return Type Changes

### Legacy Return Type
The legacy SDPMChunker returns `SemanticChunk` objects with sentence details:

```python
@dataclass
class SemanticChunk:
    text: str
    start_index: int
    end_index: int
    token_count: int
    sentences: List[SemanticSentence]  # Detailed sentence information
```

### New Return Type
The new SemanticChunker returns simpler `Chunk` objects:

```python
@dataclass
class Chunk:
    text: str
    start_index: int
    end_index: int
    token_count: int
    # No sentence details - cleaner and more efficient
```

## Full Legacy Documentation

For users who must use the legacy version, the complete original functionality remains available:

```python
from chonkie.legacy import SDPMChunker

# All original parameters still work
chunker = SDPMChunker(
    embedding_model="minishlab/potion-base-32M",
    mode="window",
    threshold="auto",
    chunk_size=2048,
    similarity_window=1,
    min_sentences=1,
    min_chunk_size=2,
    min_characters_per_sentence=12,
    threshold_step=0.01,
    delim=[". ", "! ", "? ", "\n"],
    include_delim="prev",
    skip_window=1
)

# Original methods preserved
chunks = chunker.chunk(text)
batch_chunks = chunker.chunk_batch(texts)
```

## Support

While the legacy SDPMChunker remains available for backward compatibility, it is no longer actively developed. Please consider migrating to the new SemanticChunker for:
- Better performance
- Active bug fixes
- New features
- Ongoing support

For migration assistance, see the [SemanticChunker documentation](./semantic-chunker) or open an issue on our [GitHub repository](https://github.com/chonkie-ai/chonkie).